Back
PHONEME TABLES
A phoneme table defines all the phonemes which are used by a language, together with their properties and the data for their production as sounds.
Generally each language has its own phoneme table, although additional phoneme tables can be used for different voices within the language. These alternatives are referenced from Voices files.
A phoneme table does not need to define all the phonemes used by a language. Instead it can reference a previously defined phoneme table, whose phonemes it inherits. These can then be used as they are, or overridden by new definitions, or new phonemes added. For example, a phoneme table may redefine (or add) some of the vowels that it uses, but inherit most of its consonants from a standard set.
Note: This specification is not yet complete and does not include the definitions of the formant sequence specifications.
The source files for the phoneme data is in the "phsource" directory in the espeakedit download package.
Phoneme files
The phoneme tables are defined in a master phoneme file, named phonemes. This starts with the base phoneme table followed by other phoneme tables for languages and voices which inherit phonemes from the base table or from each other.
In addition to phoneme definitions, the phoneme file can contain the following:
- include <filename>
- Includes the text of the specified file at this point. This allows different phoneme tables to be kept in different text files, for convenience. <filename> is a relative path. The included file can itself contain include statements.
- phonemetable <name> <parent>
- Starts a new phoneme table, and ends the previous table.
<name> Is the name of this phoneme table. This name is used in Voices files.
<parent> Is the name of a previously defined phoneme table whose phoneme definitions are inherited by this one. The name base indicates the first (base) phoneme table.
- phonemenumber <integer>
- This statement is used at the start of the master phonemes file to define some specific code numbers for various phonemes which are used directly within the speak program.
Phoneme definitions
A phoneme table contains a list of phoneme definitions. Each starts with the keyword phoneme and the phoneme name (this is the name used in the pronunciation rules), and ends with the keyword endphoneme. For example:
phoneme aI
vowel
length 230
formants vowels/ai
starttype (a) endtype (I)
endphoneme
phoneme s
vls alv frc sibilant
vowelin f1=0 f2=1700 -300 300 f3=-100 100
vowelout f1=0 f2=1700 -300 250 f3=-100 100 rms=20
lengthmod 3
wave unvoc/s
before _ unvoc/s_
before p unvoc/s!
before t unvoc/s!
before k unvoc/s!
switchvoicing z
endphoneme
Within the phoneme definition the following lines may occur: ( (V) indicates only for vowels, (C) only for consonants)
- Type. One of these must be present.
vowel |
liquid | semi-vowels, such as: r, l, j, w |
nasal | nasal eg: m, n, N |
stop | stop eg: p, b, t, d, k, g |
frc | fricative eg: f, v, T, D, s, z, S, Z, C, x |
afr | affricate eg: tS, dZ |
pause | |
stress | stress symbols, eg: ' , = % |
virtual | Used to represent a class of phonemes. See section ("Phoneme Pairs", below) |
- Properties:
vls | (C) voiceless eg. p, t, k, f, s |
vcd | (C) voiced eg. b, d, g, v, z |
sibilant | (C) eg: s, z, S, Z, tS, dZ |
palatal | (C) A palatal or palatalized consonant. |
unstressed | (V) This vowel is always unstressed, unless explicitly marked otherwise. |
nolink | Prevent any linking from the previous phoneme. |
trill | (C) Apply trill to the voicing. |
- Place of Articulation (C):
blb | bi-labial |
ldb | labio-dental |
dnt | dental |
alv | alveolar |
rfx | retroflex |
pla | palato-alveolar |
pal | palatal |
vel | velar |
lbv | labio-velar |
uvl | uvular |
phr | pharyngeal |
glt | glottal |
- length
- (V) The relative length of the phoneme, typically about 140 for a short vowel and from 200 to 250 for a long vowel or diphong. Currently used only for vowels.
- formants <sound spec>
- <sound spece> is a relative path to a file which defines how to generate the sound (a vowel or voiced consonant) from a sequence of formant values. (see **)
- wave <wavefile>
- (C) This is an alternative to formants. <wavefile> is a relative path to a WAV file (22 kHz, 16 bits) which will be played to produce the sound. This method is used for unvoiced consonants. <wavefile> does not include a .WAV filename extension, although the file to which it refers may or may not have one.
- before <phoneme> <sound spec>
- This specifies an alternative realization when the phoneme followed by another specified phoneme. before may be followed by several <phoneme> <sound seq> pairs.
- after <phoneme> <sound spec>
- This specifies an alternative realization when the phoneme follows another specified phoneme. Vowels are considered as two parts, start and end, so both a before and an after condition may apply to the same vowel.
- starttype <phoneme>
- Allocates this phoneme to a category for the purposes of choosing the variant of a phoneme that precedes it. See section "Phoneme Pairs" below.
- endtype <phoneme>
- Allocates this phoneme to a category for the purposes of choosing the variant of a phoneme that follows it. See section "Phoneme Pairs" below.
- reduceto <phoneme> <level>
- (V) Change to the specified phoneme (such as schwa, @) if this syllable has a stress level less than that specified by <level>
- linkout <phoneme>
- If the following phoneme is a vowel then this additional phoneme will be inserted before it.
- beforevowel <phoneme>
- The phoneme changes to this one if the next phoneme is a vowel.
- beforevowelpause <phoneme>
- Change to this if the next phoneme is a vowel or pause.
- beforenotvowel <phoneme>
- Change to this if the next phoneme is not a vowel.
- lengthmod <integer>
- (C) Determines how this consonant affects the length of the previous vowel. This value is used as index into the
length_mods
table in the CalcLengths()
function in the speak program.
- vowelin <vowel transition data>
- (C) Specifies the effects of this consonant on the formants of a following vowel. See "vowel transitions", below.
- vowelout <vowel transition data>
- (C) Specifies the effects of this consonant on the formants of a preceding vowel. See "vowel transitions", below.
Phoneme Pairs
The pronunciation of a phoneme can depend on the phonemes before and after it. Some of this modification is done automatically - the program automatically adjusts the beginning and end of a vowel to match its adjacent sounds. You can also specify variant pronunciations in the phoneme table.
The before and after statements can specify different sound variants to be used when the phoneme is before or is after another specified phoneme. The adjacent phoneme that's specified in a before or after statement may refer not just to one, but to other phonemes too. For example:
before ; unvoc/s;
means that the sound unvoc/s;
is used (rather than unvoc/s
if the following phoneme is [;]
. But this rule also applies if the next phoneme is another type of pause, [_]
or [;;]
. This is because these two include a line starttype ;
in their phoneme specifications. This means that they look like a [;]
to a preceding phoneme.
When looking for a matching before or after rule, if an exact match is not found, then a match is looked for by replacing either or both of the two phonemes by their starttype and endtype groups as appropriate.
virtual phonemes can be defined for use in starttype and endtype statements. For example, a virtual phoneme [ (i) ]
is used to represent vowels which start with and end with an [i]
type sound. So [i:]
and [I]
have starttype (i)
and those, plus diphthongs such as [aI] [eI] [OI]
have endtype (i)
. By convension, names of virtual phonemes include a pair of round brackets.
Sound Specifications
There are three ways to produce sounds:
- Playing a WAV file. This is used for unvoiced consonants such as
[p] [t] [s]
.
- Generating a wave from a sequence of formant parameters. This is used for vowels and also for sonorants such as
[l] [j] [n]
.
- A mixture of these. A stored WAV file is mixed with a wave generated from formant parameters. This is used for voiced stops and fricatives such as
[b] [g] [v] [z]
.
A <sound spec> in the phoneme table can refer to a WAV file, a formant sequence, or a mixture of both. It can also include a numeric value to adjust the length of the sound.
Vowel Transitions
These specify how a consonant affects an adjacent vowel. A consonant may cause a transition in the vowel's formants as the mouth changes shape between the consonant and the vowel. The following attributes may be specified. Note that the maximum rate of change of formant frequencies is limited by the speak program.
- len=<integer>
- Nominal length of the transition in mS. If omitted a default value is used.
- rms=<integer>
- Adjusts the amplitude of the vowel at the end of the transition. If omitted a default value is used.
- f1=<integer>
-
0: f1 formant frequency unchanged.
1: f1 formant frequency decreases.
2: f1 formant frequency decreases more.
- f2=<freq> <min> <max>
-
<freq>: The frequency towards which the f2 formant moves (Hz).
<min>: Signed integer (Hz). The minimum f2 frequency change.
<max>: Signed integer (Hz). The maximum f2 frequency change.
- f3=<change> <amplitude>
-
<change>: Signed integer (Hz). Frequence change of f3, f4, and f5 formants.
<amplitude>: Amplitude of the f3, f4, and f5 formants at the end of the transition. 100 = no change.
- brk
- Break. Do not merge the synthesized wave of the consonant into the vowel. This will produce a discontinuity in the formants.
- rate
- Allow a greater maximum rate of change of formant frequencies.
- glstop
- Indicates a glottal stop.